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Abstract 

Singular Spectrum Analysis (SSA) as a tool for analysis and forecasting of 
time series is considered. The main features of the Rssa package, which 
implements the SSA algorithms and methodology in R, are described and 
examples of its use are presented. Analysis, forecasting and parameter es- 
timation are demonstrated by means of case study with an accompanying 
code in R. 
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1. Introduction 



Singular Spectrum Analysis (SSA) is a developed methodology of time se- 
ries analysis and forecasting which comprises many different but inter-linked 



methods. There are several books devoted to SSA (Eisner and Tsonis, 1996 
Golyandina et al. , 2001 Golyandina and Zhigljavsky| 2012) as well as many 



papers related to the theory of SSA but especially to various apphcations of 
SSA (see Golyandina and Zhigljavsky (2012) for references). The scope of 



applications of SSA is very wide, from non-parametric time series decompo- 
sition and filtration to parameter estimation and forecasting. 

However, any method needs effective, comfortable and accessible imple- 
mentation. There are many implementations of SSA. They differ by the 
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potential application areas, implemented methodology, interactive on non- 
interactive form, free or commercial use, computer system (Windows, Unix, 
Mac), level of reliability and support. The most known supported software 
packages realizing SSA are the following: 

1. http://gistatgroup.com: 

general-purpose interactive 'Caterpillar'-SSA software (Windows) fol- 



lowing the methodology described in Golyandina et al. (2001); Golyan 



dina and Zhigljavsky (2012); 



http : / /www. atmos .ucla.edu/tcd/ ssa/: 

oriented mainly on climatic applications SSA-MTM Toolkit for spec- 



tral analysis ( Ghil et al. , 2002 ) (Unix) and its commercial extension 



kSpectra Toolkit (Mac), interactive; 

The commercial statistical software, SAS, includes SSA to its economet- 



ric extension SAS/ETS®Software based on methodology of Golyand- 
ina et al.| ( |200l| ). 



4. http : / / cran . r-pro j ect . org/web/packages/Rssa/ : 



R-package Rssa (Golyandina et al. 2001 Korobeynikov 2010), a fast 



implementation of the main SSA procedures for any platform, exten- 
sively developed. 

This paper is aimed to show how the methodology of the SSA analy- 
sis, forecasting and parameter estimation can be implemented with the help 
of the package Rssa. We start with description of SSA methodology (Sec- 
tion [2]), description of Rssa structure and features of Rssa implementation 
(Section |3]). These sections provide the information necessary for the proper 
use of Rssa functions and objects and proper application to real-life data. 
The description of both the SSA methodology and the Rssa package is not 
complete; much more information on SSA can be found in in [Golyandina 
et al. (2001); Golyandina and Zhigljavsky (2012) while technical description 



of the Rssa functions can be found in the help files in the package itself. 

Sections |4] and |5] contain examples of typical codes for the analysis and 
forecasting, correspondingly. Each section contains simple example and also 
case study. The examples demonstrate how to decompose time series into 
trend, periodic component and noise, how to choose SSA parameters, how to 
estimate signal parameters (e.g. frequencies), how to perform forecasting and 
check its accuracy. In Section |6] we shortly discuss the future development of 
Rssa. 
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2. SSA algorithms and methodology 



In this section we gather the information about SSA which is vital for 
understanding the implementation of SSA and the ways of proper use of 
SSA for the analysis of real-hfe data. 

2.1. Analysis 

Consider a real- valued time series Xn = {xi, . . . , xn) of length N. Let L 
{1 < L < N) he some integer called window length and K — N — L + 1. 

The algorithm of SSA consists of two complementary stages: decomposi- 
tion and reconstruction. 

2.1.1. First Stage: Decomposition 

1st step: Embedding. To perform the embedding we map the original time 
series into a sequence of lagged vectors of size L by forming K — N — L + 1 
lagged vectors 

Xi = {xi, Xi+L-i)'^, i^l...,K. 
The trajectory matrix of the series Xjv is 

/ 
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There are two important properties of the trajectory matrix, namely (a) 
both the rows and columns of X are subseries of the original series and (b) 
X has equal elements on anti-diagonals and therefore the trajectory matrix 
is Hankel. 

2nd step: Decomposition. Let {Pijf^^ be an orthonormal basis in R-^. 
Consider the decomposition of the trajectory matrix 



X = J]P,Q7 = Xi + ... + Xi„ 



(2) 



where = X^P,. = ||g,f. 

We consider two choices of the basis {Pi}i=i. 
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(A) Basic: {Pi}f^i are eigenvectors of XX""". 

(B) Toeplitz: {Pi}f= I are eigenvectors of C with entries 

^ N-\i-j\ 
I* -^1 m=l 

In both cases the eigenvectors are ordered so that the corresponding eigen- 
values are placed in the decreasing order. 

Let us remark that Case A corresponds to Singular Value Decomposition 
(SVD) of X, that is, X = ^ . VkUiV^, Pi = Ui, Qi = ^/XlVi, Vi are called 
factor vectors, Aj are eigenvalues of XX""", therefore, Ai > . . . > > 0. 

Note also that Case B is only suitable for analysis of stationary time series 



with zero mean (see e.g. Golyandina (2010)) 



Triple (a/A^, Pi,Qi) will be called zth eigentriple (abbreviated as ET) 
2.1.2. Second Stage: Reconstruction 

3rd step: Eigentriple grouping. Let d = min{j : Aj = > j}. Once 
the expansion (|2| is obtained, the grouping procedure partitions the set of 
indices {1, . . . ,d} into m disjoint subsets /i, . . . , Im- 

Define X/ = X]i6/-^j- The expansion ^ leads to the decomposition 

X = X,, + ... + X,„. (3) 

The procedure of choosing the sets Ii, . . . ,Im is called eigentriple grouping, 
li m = d and Ij = {j}, j = 1,. . . ,d, then the corresponding grouping is 
called elementary. 

4th step: Diagonal averaging. At this step, we transform each matrix 
X/^ of the grouped decomposition ([3| into a new series of length A^. Let Y 
be an L X matrix with elements yij,l<i<L,l<j<K, and let for 
simplicity L < K. By making the diagonal averaging we transfer the matrix 
Y into the series {yi, . . . ,yj\f) using the formula 

where Ag = {(/, k) : I + k = s + 1,1 < I < L,l < k < K} and \As\ denotes 
the number of elements in the set Ag. This corresponds to averaging the 
matrix elements over the "antidiagonals" . 
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Diagonal averaging Q applied to a resultant matrix X/^. produces a recon- 
structed series 'K^^^ = (^\ . . . , x^^^). Therefore, the initial series (xi, . . . , xat) 
is decomposed into a sum of m reconstructed series: 



Xn 

fc=l 



n = l,...,N. (4) 



The reconstructed series produced by the elementary grouping will be called 

elementary reconstructed series. 

2.2. Separability and choice of parameters 

Separability of two time series X^'' and xj^-* reflects the possibility to 
extract xj^^ from the observed sum X5i^ + Xij\ SSA can approximately 
separate, for example, signal and noise, sine waves with different frequencies, 
trend and seasonality, and others. 

If the time series are approximately separable, the problem of identifica- 
tion of terms in (|2| corresponding to xj^'' arises. Time series components can 
be identified based on the following principle: the form of an eigenvector repli- 
cates the form of the time series component that produces this eigenvector. 
Thus, graphs of eigenvectors can help in identification. Moreover, sinusoids 
generate, exactly or approximately, two sine wave components with the same 
frequency and shift of phases equal to 7r/2. Therefore, scatterplots of pairs 
of eigenvectors can help to identify harmonics. 

Very helpful guess for separation is w-correlation matrix. This is the 
matrix consisting of weighted correlations between reconstructed time series 
components. Well separated components have small correlations, while badly 
separated components have large correlations. 

The conditions of (approximate) separability yield recommendations for 
the choice of window length L: it should be large enough (L ~ N/2) and if 
we want to extract a periodic component with known period, then the win- 
dow lengths which divides the period provide better separabihty. Choice of 



parameters is discussed in Golyandina et al. (2001) and Golyandina (2010). 



If the time series has a complex structure, so-called Sequential SSA is recom- 
mended. Sequential SSA consists of two stages, at the first stage the trend 
is extracted with small window length and then the periodic components are 
detected and extracted from the residual with L ~ A^/2. 
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2. 3. Forecasting 

The idea of forecasting is based on the possibihty to restore the last 
coordinate of a vector belonging to the subspace using the first coordinates 
of the same vector. 

Let / be the chosen set of eigentriples, Pi G R^, z G /, be corresponding 
eigenvectors, Pi be their first L — 1 coordinates, vTj be the last coordinate 
of Pi, = "^iTrf- Then recurrent forecasting is performed by application 
of the linear recurrence relation (LRR) to the reconstructed points of the 
forecasted time series component: 

^^Af+l = O-iXn + a'2XN-l + • • • + O-L-l^N-L+l, 

where the vector R = (a^.i, . . . , ai)""" can be expressed as 



R 

1 - 7/^ - 



This procedure can be repeated to obtain M-step ahead forecast. 

Here to obtain the reconstructed series we applied diagonal averaging and 



then applied the LRR. In vector forecasting (see for details Golyandina et al. 



( |2001| )) these steps are interchanged. The vector forecast is typically slightly 



more stable but it has much larger computational cost than the recurrent 
forecast. 

2.4- Linear recurrent relations, time series of finite rank and roots 

There is a class of series that admit exact continuation. Such series are 
governed by LRRs, their trajectory matrices are rank-deficient, for these 
series the number of non-zero terms in ^ does not depend on L, and so 
on. This class of series provides a natural model for SSA and especially for 
forecasting. Let us introduce it formally. 

Definition 1. A time series = {si}^^ is governed by a linear recurrence 
relation (LRR), if there exist ai, . . . , such that 

T 

Si+t = ^ OkSi+t-k, I < i < N - t, at 0, t < N. (6) 

k=l 

The number t is called the order of the LRR, ai, . . . ,at are the coefficients 
of the LRR. If t = r is the minimal order of an LRR that governs the time 
series Sn, then the corresponding LRR is called minimal. 
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Time series is called time series of finite rank r if its L-trajectory matrix 
has rank r for any L > r (we supposed that L < K). 

Note that if the minimal LRR governing the signal Sat has order r with 
r < N/2, then S^r has rank r. 

Definition 2. A polynomial Pt{^) = ^ — ^^=1 Q-fcyU*"'^ is called a charac- 
teristic polynomial of the LRR ^ . 

Let the time series Sqo = (-Si, • • • , . . .) satisfy the LRR ^ with 7^ and 
i > 1. Consider the characteristic polynomial of the LRR ([6]) and denote 
its different (complex) roots by yUi, . . . ,/ip with 1 < p < t. All these roots 
are non-zero as at 7^ 0. Let the multiplicity of the root Hm be km, where 
1 < m < p and ki + . . . + kp = t. 

It is well-known that the time series Soo = (si, . . . , s^, . . .) satisfies the 
LRR (g for alH > if and only if 

m=l \ j=0 J 

where the complex coefficients Cmj depend on the first t points Si,...,St. 
For real-valued time series, ([T]) implies that the class of time series governed 
by the LRRs consists of sums of products of polynomials, exponentials and 
sinusoids. 

Rank of the series is equal to the number of non-zero terms in Q. For ex- 
ample, an exponentially-modulated sinusoid s„ = Aexp (an) sm{2Trun + (p) 
is constructed from two conjugate complex roots /ii^2 = exp(a; ± i2Tiuj) = 
pexp{±i2Tru) if its frequency u G (0,0.5). Therefore, the rank of this 
exponentially-modulated sinusoid is equal to 2. The rank of the exponential 
is equal to 1, the rank of a linear function corresponding to the root 1 of 
multiplicity 2 equals 2, and so on. 

Note that if we know the roots generated by a series component, then 
we know its interpretable parameters. For example, the frequency of an 
exponentially-modulated sinusoid can be found using the argument of the 
corresponding complex roots, while root module p gives the exponential rate 
a = In p. 

If the LRR is not minimal, then only r of the roots correspond to the sig- 
nal. Other roots are extraneous and can influence the forecast. For example, 
extraneous roots that have modules larger than 1 can lead to instability. 
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2.5. Estimation of frequencies 

Let = Sj\f + M.]\f, where s„ = Yl]=i /^j and the series Sat and M.^ are 
approximately separable for a given window length L. As has been discussed 
above, the signal roots of the forecasting LRR allow estimating the signal 
roots fij, j = l,...,r. However, we should somehow distinguish between 
signal and extraneous roots. Usually, the signal roots have maximal modules 



(e.g. see Usevich (2010)). However, this is not necessary. Therefore, the 
methods that are able to separate the signal and extraneous roots, can be 
very useful. 



Let us describe one of these methods called ESPRIT (Roy and Kailath 



1989). Denote by / = the indices of the eigentriples which 

correspond to Sat (if Sat is the signal then I = {l,...,r}). Denote by 
Ur = [Ui-^ : . . . : Ui^.], Ur the matrix with the last row removed and 
is the matrix with the first row removed. Then /Xj can be estimated by the 
eigenvalues of the matrix UjU,-, where f means pseudo-inversion. Corre- 
spondingly, the estimated frequencies are the arguments of /Xj. 

There is an additional fast method of frequency estimation. This method 
is mostly used for the identification of the eigentriples at the Grouping step. 
The two eigenvectors produced by an exponentially-modulated sine wave 
have similar form and their phases differ by 7r/2. Let A and B be de- 
fined by an = p"' sm{2iTujn + 0) and 6„ = p"cos(2'7ra;n + 0). Then u = 

Z ((bi) ' (S)) Therefore, we can estimate the frequency using these 
two eigenvectors. Since the eigenvectors f/'-^^ and f/^^^ do not have exactly 
the same form as A and B, the sequence of angles ^ ( ( (2) 1 , ( (2)M ) /^^' 
i = 1 L — 1, can be considered and then the mean or median can be 



taken as an estimate of the frequency (see Golyandina et al. (2001) for de- 
tails). In Rssa, the median is considered and the median of deviations from 
the median is taken as a measure of accuracy. 

2.6. Bootstrap confidence intervals 

Let again Xjy = + M.]^ and let us describe the construction of boot- 
strap confidence intervals for the signal Sa^ and its forecast, based on the 
assumptions that the signal has rank r and the residuals are white noise. 
The algorithm consists of the following steps. 

• Fix L, I = {1, . . . , r}, apply SSA, reconstruct the signal and obtain the 
decomposition = Sn + ^n- 
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Fix Sjv, calculate the empirical distribution of the residual 



• Simulate M independent copies M^r j, i = 1, . . . , M, using the empirical 
distribution, construct Xjy^j = S^r + Mtv,*- 

• Apply SSA with the same L and / to Xjv^j, reconstruct the signal, then 
perform M-step ahead forecasting and obtain Sn+m,!, i = 1, • • • , Q- 

• For each time point j consider the sample Sj i = 1, . . . , Q, and con- 
struct 7-confidence interval as the interval between 7/2- lower and up- 
per sample quantiles. The sample mean is called average bootstrap 
forecast. 

2. 7. Choice of parameters for forecasting 

Basic rules for parameter choice in forecasting are generally the same as 
for the reconstruction. A considerable difference is that more stable recon- 
struction can be even more important for forecasting than for the reconstruc- 



tion accuracy. Also, simulations and theory (Golyandina, 2010) show that 
it is better to choose window length L smaller than half of the time series 
length A^. One of the recommended values is A^/3. 

As a rule, recommendations are valid if the series approximately satisfies 
the model of noisy time series governed by an LRR. Real-life time series 
always need an additional analysis. 

If the time series X^r is long and has stable structure, the technique of slid- 
ing forecasts can be applied. We can choose the length Ng of sliding subseries, 
fix the window length L and the group of indices /, choose the forecasting 
horizon and then perform forecasts of subseries Xj_j+7v^_i = (xj, . . . , Xj+at^-i), 
i = 1, . . . , N — Ns. The proper choice of L and / corresponds to small average 
mean square error (MSE) of forecasts. The choice of parameters allowing to 
obtain the minimal accuracy is not necessary the best, since, for example, 
the stability with respect to small changes of the window length may be more 
important. 

If the time series is long but its structure can be changing in time, then 
the estimation of forecast accuracy can help to understand how many of last 
points should be considered for forecasting. 

For checking the stability of forecasts the confidence intervals can be also 
useful. 
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3. Rssa package 



The main entry point to the package is new. ssa function which performs 
the Embedding step and (optionally, enabled by default) the Decomposition 
step. The function has the following signature: 

new.ssa(x, L, kind, svd. method, force . decompose = TRUE) 

Here x argument receives the input series. L specifies the window length 
(equals to half of the series length by default), kind argument selects between 
different SSA algorithms. In this paper we deal with SSA for one-dimensional 
time series and therefore consider mostly the option kind=" Id-ssa" and 
shortly kind="toeplitz-ssa" (option for multivariate SSA kind="2d-ssa" 
is not considered). Different variants of Singular Value Decomposition imple- 
mentations can be selected via svd. method argument. The implementations 



will be discussed later in section 3.1 With default value force . decompose 
= TRUE this function fulfills Decomposition stage of the algorithm. All other 
arguments are passed to decompose function. Usually this is neig argument 
which allows one to request the desired number of eigentriples to compute 
(such request can be ignored depending on the chosen SVD method). 

The result of the function is SSA object which is the input for the most 
of other functions in the package. The contents of the object can be viewed 
via summary function. 

The function reconstruct (this , groups) is used to perform Recon- 
struction stage. The first argument is the SSA object. The second argument 
specifies the eigentriple grouping ^ and should be a list of vectors of indices 
of elementary series Ij. 

The principle of automatic calculation of necessary objects is used in 
the implementation of the package. For example, if 10 eigentriples were 
calculated while decomposing, then the user yet can perform reconstruction 
by the first 15 components, since the decomposition will be automatically 
continued to calculate 11-15 eigentriples. Also, the routines try hard to 
reuse the results of the previous calculations in order to save time (hence 
the cache argument of many routines). For example, the elementary series 
once calculated are stored inside the SSA object, so next time reconstruct 
function might not need to calculate the resulting series from scratch. Also, 
since SSA objects tend to occupy a decent amount of RAM, the functions and 
data structures were designed to minimise the amount of memory copying. 
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Such efficient memory bookkeeping and invisible caching of the interme- 
diate results puts additional semantic burden on SSA object. In particular, 
SSA objects effectively are references and thus cannot be copied freely via the 
standard assignment operator <-. Instead deep copy function clone should 
be used. The internal cache can be freed via cleanup routine. 

The internals of SSA object can be examined with the use of $ operator. 
In particular, the following fields related to the expansion [2] can be extracted 
out of SSA object: lambda contains the eigenvalues (Pj) of the trajectory 
matrix, U is a matrix with eigenvectors in columns and V (might be null) is 
a matrix of factor vectors (Qi/||<5i||). 

3.1. SVD methods 

In many cases only few leading eigentriples are of interest for SSA analysis. 
Thus the full Singular Value Decomposition of the trajectory matrix can yield 
big computational and memory space burden (here we consider the option 
type=" Id-ssa"). Instead, the so-called Truncated SVD can be used and only 
a number of desired leading eigentriples can be computed. Four different SVD 
implementations are available in Rssa and can be specified via svd. method 
argument of new.ssa function: 

• "nutrlan" — Truncated SVD via thick restarted Lanczos bidiagonal- 



ization algorithm (Yamazaki et al. , 2008). The method internally cal 



culates the eigenvalues and eigenvectors of the matrix XX""". Factor 
vectors are calculated on-fly during Reconstruction step when neces- 
sary. 

"propack" — Implicitly restarted Lanczos bidiagonalization with par- 



tial reorthogonalization (Larsen 1998). The method calculates the 



truncated SVD of the trajectory matrix X (and thus calculates the 
factors vectors as well). 

"eigen" and "svd" — Full decomposition of the trajectory matrix us- 



ing either eigendecomposition or SVD routines from LAPACK (Anderson 



et al. , 1999). These are basically the straightforward implementations 
of the basic SSA algorithm without any additional computational- and 
space complexity reductions via additional sophisticated algorithms. 
Note that both methods perform full decompositions and thus neig 
argument (which allows one to request desired number of eigentriples) 
is silently ignored for these methods. 
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Proper selection of SVD method is a bit of art. However, there are several 
easy rules of thumb which work perfectly in most situations. 

First, it is unwise to use the Lanczos-based truncated SVD methods if the 
trajectory matrix is small or "wide" . This corresponds to small series lengths 
(say, N < 100) or small window lengths (L < 50). Also, it is unwise to ask 
for too many eigentriples: when more than half of window length eigentriples 
are desired it is better to use full SVD instead of truncated. 

SVD method eigen works best for small window lengths since in this case 
the eigendecomposition of small matrix needs to be computed. 

Usually the propack method tends to be slightly faster and more nu- 
merically stable than nutrlan, however might yield considerable memory 
consumption when factor vectors tend to be big. For example, for time se- 
ries of length approx 87000 and window length 43500 the decomposition with 
nutrlan method took 16 seconds and 13 seconds via propack. The memory 
consumption for the latter method is as twice as the consumption of the 
former. Such difference is mostly important for multivariate version of SSA 
and should not be a problem in our case. 

By default nutrlan method is selected. However, new.ssa function tries 
to correct the selection, when the use of the method might be unsafe. In 
particular, for short series, small window length or big number of desired 
eigentriples the eigen method is automatically selected. 

3.2. Efficient implementation 

All the computation algorithms in the package are written with compu- 



tation speed in mind. The details of the algorithms can be found in (Ko- 



robeynikov 2010). Here we outline the computation complexities of different 



SSA stages and the algorithms used. 
3.2.1. Basic SSA 

We should explicitly distinguish between specialised Lanczos-based SVD 
methods (nutrlan and propack) and generic SVDs (svd and eigen). The 
former can be made to exploit the special Hankel structure of the trajectory 
matrix and thus reduce the computational and space complexity of all the 
algorithms. 

1. Generic SVD methods: 

(a) The Embedding step naturally has negligible computational com- 
plexity. Its space complexity is 0{LK). The worst case for generic 
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SVD methods coincides with L being equal to N/2 and the storage 
complexity is 0{N'^). 
(b) The Decomposition step computational complexity is 0{L^+L'^K) 



for eigen method and 0{L'^K+LK'^+K^) for svd method (Golub 



and Van Loan, 1996). So, in the worst case L ~ N/2 the compu- 
tation complexity is 0{N^) for both SVD methods, 
(c) The computational complexity of Reconstruction stage depends 
on the upper bound of number of elementary series used. Let us 
denote such bound as k. Then the complexity of this stage is 
0{kLK + kN) with the worst case being OikN"^). 
2. Lanczos-based SVD methods: 

(a) The Embedding step has O(A^logA^) computational and 0{N) 
storage complexity. The increased complexity is due to additional 
preprocessing necessary for efficient algorithms used during the 
Decomposition and Reconstruction steps. Note that no trajectory 
matrix is computed there, instead the representation via so-calle 
Toeplitz circulant is used. 

(b) The major speed-up can be seen during Decomposition step, since 
both truncated SVD and the special Hankel structure of the tra- 
jectory matrix contributes to the computation complexity. In par- 



ticular, it can be shown (Korobeynikov, 2010) that the multipli 



cation of Hankel matrix by a vector can be viewed as a special 
case of convolution. The latter can be efficiently calculated by the 
means of the Fast Fourier Transform (FFT). 
If k eigentriples should be computed, then the complexity of such 
Hankel Truncated SVD is 0{kN log N+k'^N) and does not depend 
on the window length, 
(c) The Reconstruction stage can be viewed as forming the elemen- 
tary series and then — summing some of them depending on 
the grouping chosen. The computation of each elementary se- 
ries, which is rank 1 hankelisation can again be viewed as special 
form of convolution. Thus the FFT-using Reconstruction is done 
in 0{kN log N) operations. 

All this explains the automatic choice of SVD method described in the 
previous section. From the comparison of the implementations we are seeing 
that Lanczos SVD-based methods work best when the window length L is 
large and series length is not too small. Therefore, Lanczos SVD-based 
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methods make it possible to achieve better separabihty by mean of use of 
large window length. 



3.2.2. Forecasting 

Efficient and stable implementation of the forecasting routines are neces- 
sary in order not to produce forecast itself but also to study the structure of 
the series. 

First, we should mention the procedure which calculates the roots of the 
characteristic polynomial of the LRR. The task itself looks standard: we 
have to calculate all the roots of the polynomial of the degree equals to 
window length L. Unfortunately, the standard R function polyroot which 



implements the classical Jenkins- Traub algorithm (Jenkins and Traub, 1970) 



produces imprecise results for roots of characteristic polynomials of LRRs. 
In order to get rid of the precision loss, the roots are derived via explicit 
eigenvalues calculation of the polynomial companion matrix. 

Another computation-intensive routine is vector forecast. The idea of 
the method itself is quite simple: the resultant matrix of the reconstructed 
series should be extended (by adding columns) while keeping the rank. The 



classical algorithm as in (Golyandina et al. , 2001) involves the calculation of 



the projections onto the space spanned by selected eigenvectors. For p-step 
ahead forecasting, the complexities of doing such projections are 0{k{p + 
L)L'^), where k is number of eigenvectors used for reconstruction. However, 
the problem of vector forecast can be reduced to the ESPRIT-like system of 
linear equations. The effective solution of such system of equations according 



to (Badeau et al. , 2005 ) allows to reduce to complexity down to 0{k {p+ L)). 



4. Basic SSA with R 

4.1. Typical code 

Let us consider the standard "co2" time series available from every R in- 
stallation. The series depicts atmospheric concentrations of C02 from Mauna 
Loa Observatory, Hawaii and contains 468 observations, monthly from 1959 



to 1997 (Keeling and Whorf, 1997) 



Code fragment |4.1| presents the typical code for construction of the time 
series decomposition. 
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Fragment 4.1: "co2": typical code of SSA analysis 

library (Rssa) 

# Decomposition stage 

s <- new.ssa(co2, L = 120) 

# Default value for window length L = (length(co2) + 1) / 2 

# Reconstruction stage 

# The results are the reconstructed series r$Fl, r$F2, and r$F3 
r <- reconstruct (s , groups = list(c(l,4), c(2, 3), c(5, 6))) 

# Calculate the residuals 
res <- residuals (r) 

The above code does not answer the question how to set groups to obtain 
reasonable result. Proper grouping can be done looking on the diagnostic 
plots. First, plot can be called on SSA object itself. Here type argument 
can be used to select different plots available: 

1. "values" depicts eigenvalues (default); 

2. "vectors" shows ID graphs of eigenvectors to detect trend components 
and saw-tooth component (if any); 

3. "paired" shows 2D graphs of eigenvectors to detect sine waves; 

Second, function wcor being applied to SSA object calculates w-correlation 
matrix for elementary reconstructed components. It can be plotted in the 
standard way via plot (wcor (s) ) . Such picture can be used to determine 
the separability points. 



The use of these functions is summarized in the code of Fragment |4.2 
Fragment 4.2: "co2": diagnostic plots 

plot(s) # Eigenvalues 

plot(s, type = "vectors") # Eigenvectors 
plot(s, type = "paired") # Pairs of eigenvectors 
plot (wcor (s) ) # w-correlation matrix plot 

The result of the reconstruct function is at the same time a list with 
components Fl, F2, . . . , which contain the reconstructed series and the re- 
construction object^ which can be conveniently plotted to see the result of 
reconstruction. 

The plot method for reconstruction object has two main arguments 
which configures the view of the resulting figure. 

1. plot. method argument might be "matplot" or "native". In the for- 
mer case all plotting is done via standard matplot function call. In the 
latter case the native plotting method of time series object is used for 
plotting (provides best results for e.g. ts objects). 
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2. type depicts whether raw reconstructed series are to be plotted (ar- 
gument value "raw") or cumulative series r$Fl, r$Fl+r$F2, and so on 
(argument value "cumsum"). 

So, in our case one can look at plot(r, plot. method = "native") for 
all reconstructed time series separately, and plot(r, type = "cumsum", 
plot. method = "native") for cumulative series. 

The groups argument of wcor function can be used to specify the group- 
ing used for reconstruction. The plot of such w-correlation matrix can be 



used to check the quality of separability. See Fragment |4.3| for example. 

Additional logical arguments add. residuals and add. original can be 
used to add the residuals and original series to reconstruction plots. This 
way one can generate the picture containing the decomposition into the sum 
of trend, seasonality and noise. The result is depicted in Fig. [1} 

Fragment 4.3: "co2": reconstruction plots 

# w-correlation matrix for reconstruction 
plot(wcor(s, groups = list(c(l,4), c(2,3), c(5, 6)))) 

# Decomposition into trend + seasonality and noise 
plot(r, plot. method = "native", 

add. original = TRUE, add. residuals = TRUE) 

Certainly, this resultant decomposition of the observed time series looks 
like a trick, since we have not explained how were the window length and 
the grouping chosen. For "co2" series this can be done very easily and we 



address the reader to the books (Golyandina et al. , 2001 



Golyandina and 



Zhigljavsky, 2012) for detailed information and Section 2.2 for short descrip- 
tion of the principles of parameter choice. In fact, "co2" series does contain 
two additional sine-wave components, which are hidden inside the residuals. 
We leave the procedure of finding these components as an exercise for the 
reader. 

Below we consider a more complicated example with explanation of pa- 
rameter choice, using the two-stage Sequential SSA. 

4-2. Case study 

Let us analyse the time series "Motor Vehicle" which contains the monthly 



data about total domestic and foreign car sales in the USA (U.S. Bureau of 



Economic Analysis, 2010). 

We start with the code resulting in the time series decomposition, then 
show the graphs and comment on the logic of the investigation. 
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First, we will assume that Rssa package is already loaded. The series 
is available from the package and can be loaded via data(MotoVehicle) 
command. Total series length is 541. 

Fig. |4] shows that the form of trend is complex. This causes impossibility 
to obtain the full decomposition of the time series at once. Therefore, let 
us perform the decomposition sequentially. First, let us extract trend. Since 
for such form of the trend its extraction is similar to smoothing, we start 
with choosing minimally possible window length equal to 12. The reason for 
choice of window length is similar to that in moving averaging procedure: 
for smoothing the time series containing a periodic component, the window 
length should be divided by the period. 



Fragment |4.4| performs decomposition and displays the information about 
the resulting ssa-object. 

Fragment 4.4: "MotorVehicle" , 1st stage: decomposition 
s <- new. ssa(MotorVehicle, L=12) 

# Look inside 's' object to see, what is available, 
summary (s) 

This is the example of the output of summary (s): 
Call: 

new.ssa(x = MotorVehicle, L = 12) 

Series length: 541, Window length: 12, SVD method: eigen 
Computed: 

Eigenvalues: 12, Eigenvectors: 12, Factor vectors: 

Precached: elementary series (0 MiB) 

Overall memory consumption (estimate): 0.005791 MiB 

The SVD method "eigen" was chosen by default, since the window length 
is small and therefore fast SVD methods are not effective. Since the pre- 
caching is implemented in Rssa, it is important to know what elements are 
already calculated. You can see that there are 12 eigenvectors and elemen- 
tary reconstruction components. 



Now let us look at the decomposition results in Fragment 4^ for compo- 
nent identification. 
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Fragment 4.5: "MotorVehicle" , 1st stage: visual information for 
grouping 

# Plot of eigenvalues 
plot (s) 

# Plot of eigenvectors 

# Here 'idx' argument denotes the indices of vectors of interest 
plot(s, type = "vectors", idx=l:6) 

# Plot of elementary reconstructed series 

# Here 'groups' argument specifies the grouping 
plot(s, type = "series", groups = as. list (1:6)) 

Note that plot of eigenvalues does not need additional calculations due 
to pre-caching, while the plot of elementary reconstructed components needs 
additional time for calculations (though such calculations are perfomed only 
once for given set of elementary components). The repeating call of summary (s) 
shows that Precached: 6 elementary series (0.02497 MiB). 

The graph of eigenvectors is not informative here and just reflects the 
large contribution of the leading eigentriple. Fig. |2] shows the form of the 
six leading eigenvectors. The leading eigenvector has almost constant co- 
ordinates and therefore it corresponds to pure smoothing by Bartlett filter 
(see Golyandina et al. (2012) and Golyandina and Zhigljavsky (2012)). The 
result of reconstruction by each of six eigentriples is depicted in Fig. |3| Both 
figures confirm that the first eigentriple corresponds to the trend, the other 
eigentriples contains high-frequency components and therefore are not re- 
lated to the trend. The trend from Fig. |4] is exactly the trend produced by 
one leading eigentriple and coincides with the first reconstructed components 
in Fig. [3j The trend can be reconstructed by the code from Fragment 4.6 

Fragment 4.6: "MotorVehicle", 1st stage: reconstruction 

resl <- reconstruct (s , groups = list(l)) 
trend <- resl$Fl 

We have extracted the trend and the next stage is the extraction of season- 
ality from the residual obtained by res .trend <- residuals (resl) . First, 
look at the periodogram (Fig. [S]) by call spec .pgram (res .trend, detrend 
= FALSE, log = "no"). We see that the seasonality consists of sine waves 
with periods 12, 6, 4, 3, 2.4. Let us extract them by SSA. 

For better separability, we take the window length L = 264 as maximal 
window length, which is not greater than half of time series length and is 
divided by 12. 
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Figure 2: "Motor Vehicle" , 1st stage: eigenvectors {L — 12) 

Fragment 4.7: "MotorVehicle" , 2nd stage: decomposition and vi- 
sual information 
s <- new. ssa(res . trend, L=264) 
plot (s) 

plot(s, type = "paired", idx = 1:12, plot.contrib = FALSE) 

# Calculate the w-correlation matrix using first 30 components. 

# Here groups argument as usual denotes the grouping used, 
w <- wcor(s, groups = as . list (1 : 30) ) 

plot (w) 

The code summary (s) shows that SVD method: nutrlan and the number 
of calculated eigenvalues and eigenvectors is equal to 50 (default). 

For proper identification of the sought sine waves, we will use the graph 
of eigenvalues, scatterplots of eigenvectors and w-correlation matrix of the 
elementary components. In Fig. |6] we see several steps produced be approxi- 
mately equal eigenvalues. Each step is likely yielded by a pair of eigenvectors 
which correspond to a sine wave. Fig. [7] confirms our guess. One can see 6 
almost regular polygons. ETl-2, ET3-4, ET5-6, ET7-8 and ET9-10 corre- 
sponds to periods 12, 6, 2.4, 3, 4, which are produced by the seasonality and 
are totally explained by the periodogram Fig. |5} The components are ordered 
in accordance to ordering of the periodogram values at these frequencies. 
Fig [8] shows that these pairs are highly correlated within and are almost not 
correlated between. Note that there is one more pair of eigentriples ETll-12 
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Figure 3: "Motor Vehicle" , 1st stage: elementary reconstructed series {L = 12) 

which satisfies the same properties. Since this pair corresponds to the period 
16, which is not interpretable for monthly data, we refer it to noise. The 
estimation of periods was performed by the function parestimate and the 
results are 

> parestimate (s , 1:12, method = "esprit-ls")$periods 

[1] 2.996167 -2.996167 12.008794 -12.008794 2.398670 -2.398670 
[7] 16.198097 -16.198097 5.982904 -5.982904 4.014053 -4.014053 

> parestimate (s , 11:12, method = "pairs") 
[1] 15.9677 

Let us present the results of the decomposition. 
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Figure 4: "Motor Vehicle" , 1st stage: initial series and estimated trend {L — 12, ETl) 

Fragment 4.8: "MotorVehicle", 2nd stage: reconstruction and plot- 
ting of the results 

res2 <- reconstruct (s , groups=list (1 : 10) ) 
seasonality <- res2$Fl; 
res <- residuals (res2) ; 

# Extracted seasonality 

plot(res2, plot. method = "native", add. original = TRUE, 
col = c( "black", "red")) 

# Result of Sequential SSA 

plot(res2, base. series = resl, plot. method = "native", 
add. original = TRUE, add. residuals = TRUE) 

# Seasonally adjusted series 

plot (MotorVehicle-seasonality , type='l' ) 



The extracted seasonality (ETl-10) is depicted in Fig[9j Slowly change 
of sine wave phases seen in Fig. [7] yields the periodic behaviour of complex 
form. Fig. [TO] shows the resultant decomposition of both two stages of Se- 
quential SSA. Note that the obtained noise residuals are heterogeneous. As 



an auxiliary result, we obtain also seasonally adjusted series (Fig. 11). 

Finally, let us demonstrate how to estimate the variance of the heteroge- 
neous noise. The procedure is based on two observations: first, the variance 
is equal to expectation of squared residuals; second, for stochastic process 
the trend is its expectation. Therefore, the variance can be estimated as the 
trend of squared residuals. This trend can be extracted by SSA with small 
window length and reconstructed by the leading eigentriple. The choice of 
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Figure 5: "Motor Vehicle" , 2nd stage: periodogram of the series (i.e., of the residual at the 
1st stage) 



window length effects on the detail level of the trend. The choice L = 30 



provides appropriate trend. The result of Fragment 4.9 is depicted in Fig. 12 
containing the residuals with standard deviation bounds. 
Fragment 4.9: "MotorVehicle" : finding noise envelope 

s <- new. ssa(res"2, L=30) 

rsd <- sqrt (reconstruct (s , groups=list (1) ) $F1) 

plot(res, type='l'); lines(rsd, type='l'); lines(-rsd, type='l') 

Remark 1. For stationary time series Toeplitz SSA is appropriate. In the 
described example, it has no sense to apply Toeplitz SSA for trend extraction. 
Generally, it can be applied to extraction of seasonality; it can be made via 
the calls <- new. ssa(res .trend, L=264, kind="toeplitz-ssa") How- 
ever, the result of decomposition is worse, since the seasonal behaviour is 
changing in time. Note that the ordering of eigentriples by eigenvalues of the 
matrix S and their contribution to the decomposition differ. The values of 
sSlambda are equal to the contribution values, while the ordering is performed 
by eigenvalues. Therefore the graph plot(s) can be not-monotonic. 



5. SSA forecasting with R 
5.1. Typical code 

After the decomposition has been fulfilled, the forecasting become avail- 
able. Rssa implements two methods of forecasting, recurrent and vector 
forecasts. 



23 




Figure 6: "Motor Vehicle" , 2nd stage: eigenvalues (L — 264) 

Fragment 5.1: "co2": forecasting 

# Decomposition stage 

s <- new.ssa(co2, L = 120) 

# Recurrent forecast, the result is the forecasted values only 

# The result is the set of forecasts for each group 

fori <- rforecast(s, groups = list(l, c(l,4), 1:4, 1:6), len = 12) 
matplot(data.frame(forl) , type='b' , pch = c ( ' 1 ' , '2 \ ' 3 ' , '4' ) ) 

# Recurrent forecast, the forecasted points added to the base series 
forla <- rforecast(s, groups = list(l, c(l,4), 1:4, 1:6), len = 12, 

only. new = FALSE) 
#plot of the forecast based on the second group c(l,4) 
matplot(data.frame(c(co2, rep(NA,36) ) ,f orla$F2) , type='l') 

# Vector forecast 

for2 <- vforecast(s, groups = list(l:6), len = 12) 

# Confidence intervals, they can be calculated for one group only 
forS <- bforecast(s, group = 1:6, len = 12, type = "recurrent") 
matplot(for3, type="l", col=c("black" , "red" , "red")) 



The forecast for trend (ETl and ET4) is sliown on Fig. 13 together with 
the initial series. 

Together with forecasting, this block of Rssa functions provides tools for 



analysing of the forecasting linear recurrence relation (Fragment 5.2 The 



roots are ordered by modulus, since typically (but not necessary) the signal 
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Figure 7: "Motor Vehicle" , 2nd stage: scatterplots for eigenvector pairs (L = 264) 



roots have maximal modulus (see Usevich (2010) for theoretical results about 
signal and extra roots). 

Fragment 5.2: "co2": linear recurrence relation 
num <- 1:6 

Irr. coef<-lrr(s, group = num) 
r <- roots (Irr . coef) 

# Plot of roots against the unit circle 
plot (Irr . coef) 

To obtain explicit form of forecasting formula, we can use as signal roots 
of the characteristic polynomial of LRR, as special methods of estimation. 



For parameter estimation (frequency and damping rate) code of Fragment 5.3 
can be used. 

Fragment 5.3: "co2": parameter estimation 

print (2*pi/Arg (r [num] ) ) 
print (Mod(r [num] ) ) 

parestimate(s, 1:6, method = "esprit-ls") 
parestimate(s, c(2:3,5:6), method = "esprit-ls") 

The result of estimation through the roots is 
> print (2*pi/Arg(r [num] ) ) 

[1] 5.999366 -5.999366 11.996071 -11.996071 Inf Inf 
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Figure 8: "Motor Vehicle" , 2nd stage: w-correlation matrix {L = 264) 



> print (Mod (r [num] ) ) 

[1] 1.000575 1.000575 1.000385 1.000385 1.000354 0.985554 



By sense, all roots are related to the signal. Results of ESPRIT method are 
confirm it. This means that the explicit form of the forecast is the sum of 
half of year and annual sine waves with almost constant amplitude and also 
the trend approximated by a sum of two exponentials. 

To find proper parameters of the method, the testing of the forecasting 
can be performed. The code from Fragment 5.4 shows how to implement the 
function to see the dependence of the forecasting accuracy on window length 
and on number of the selected components. 
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Figiirc 9: "MotorVchiclo" , 2nd stage: the series and the extracted seasonal component 

Fragment 5.4: Function for sliding forecasts 

forecast . check <- f unctioiiCF, 

groups , 

forecast. len = 1, sliding. len = N °/o/°/« 4, 
. . . , 

type = c ( "recurrent " , "vector")) { 

type <- match. arg (type) 
N <- length (F) 

K. sliding <- N - sliding. len - forecast. len + 1 

r <- matrix (nrow = K. sliding, ncol = length (groups) ) 

f .fun <- if (identical (type, "vector")) vforecast else rforecast 

for (i in 1:K. sliding) { 

F. train <- F[seq(from = i, len = sliding. len)] 

F. check <- F[seq(from = sliding. len + i, len = forecast . len)] 

s <- new. ssa(F. train, ...) 

for (idx in seq_along (groups) ) { 
group <- groups [ [idx] ] 

f <- f.fun(s, groups = list(group), len = f orecast . len) [ [1] ] 
r[i, idx] <- mean((f - F. check) "2) 

> 

} 

apply (r, 2, mean) 

} 
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Figure 10: "Motor Vehicle" : Series and its trend-periodic-residuals decomposition 



Fragment 5.5 contains the examples how to use shding forecasts for choice 
of parameters. Commented hnes shows other reasonable choice for the cor- 
responding variables. Length of sliding subseries equals 360, while the series 
length is equal to 468. Choice f 1 <- 1 corresponds to 108 one-step ahead 
forecasts (short-term forecasting), choice f 1 <- 108 corresponds to one 108- 
step ahead forecast (long-term forecasting). 
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Figure 11: "Motor Vehicle" : Seasonally adjusted series 

Fragment 5.5: "co2": dependence of forecast accuracy on choice of 
parameters 

Lmin <- 24 

N <- length(co2) 

ns <- 360 

# fl <- 1 
fl <- N-ns 

# groups <- list(l:6, 1:10, 1:15, 1:20) 

groups <- list(c(l,4), 1:4, 1:6, c(l:6, 14, 15)) 
Lseq <- seq(Lmin, ns-Lmin, by = 6) 

fcL <- function(L) { 

forecast . check(co2, groups, 

forecast . len = 1, sliding. len = ns, 
L = L, neig = 20, type=" vector") 

> 

m <- sapply(Lseq, fcL) 
matplot (time (co2) [Lseq] , t(m) , 

type = "1", col=c("red", "green", "blue", "black")) 

5.2. Case study 

Let us consider tlie same example "Motor Vehicle" . Since the trend has 
complex structure, it has sense to forecast trend and seasonality separately. 

We start with forecasting the seasonality. Fragment |5 . 6 1 p erforms forecasts 
by recurrent and vector methods. 
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Figure 12: "Motor Vehicle" : Residuals with envelopes 
Fragment 5.6: "MotorVehicle" seasonality: forecasting 

s <- new. ssa(MotorVehicle, L=12) 

trend <- reconstruct (s , groups = list(l)) 

res. trend <- residuals (trend) 

trend <- trend$Fl 

s <- new. ssa(res . trend, L=264) 

free <- rforecast(s, groups = list(l:10), len = 60)$F1 
fvec <- vforecast(s, groups = list(l:10), len = 60)$F1 
matplot (cbind(frec, fvec), type='l') 

The results are stable enough and difference between recurrent and vector 
forecasts are very small. To estimate the forecasting error, the bootstrap 
confidence intervals for the forecasted component can be calculated. This 



can be made with the help of function bf orecast, see Fragment 5.7 



Fragment 5.7: "MotorVehicle" seasonality: bootstrap confidence 
intervals 

fboot <- bforecast(s, group = 1:10, len = 60, type = "recurrent") 
matplot (f boot, type="l", col=c("black" , "red" , "red")) 

The confidence bounds are depicted in Fig. [l4j Certainly, at least ap- 
proximate normality and independence of residuals should be checked before 
applying confidence intervals based on these assumptions. 

Let us check, if the removing of the starting period of time series can 
improve accuracy. 



30 




Figure 13: "co2" : trend forecast 

Fragment 5.8: "MotorVehicle" seasonality: dependence of forecast 
accuracy on number of removed points 

N <- length(MotorVehicle) 

groups <- list (1:6, 1:8, 1:10, 1:30) 

Nstart <- seq (1, 241, 20); 

fcT <- function(NN) { 

forecast . check (res .trend [NN : N] , groups, 

forecast . len = 12, sliding. len = 240, 

L = 120, type= "re current " , svd.niethod="eigen") 

> 

m <- sapply (Nstart , fcT) 

matplot (time (MotorVehicle) [Nstart] , t(m), 

type = "1", col=c("red", "green", "blue", "black")) 



Fragment 5^ uses the function defined in Fragment |5.4[ One can see 
in Fig. 15 that it is better to use the whole time series and perform the 
forecasting based on ETl-8 or ETl-10. 

Totally different situation takes place with the trend forecast. Since the 
trend possibly has a structure changing in time, it is not reasonable to use 
the whole trend for forecasting. However the question is what was the last 
point of the structure change. 

Probably, the last point of structure change is 2009 (crisis)]^ Therefore, 



The points of the structural change need to be studies additionally either via SSA 
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Figure 14: "Motor Vehicle" seasonality: bootstrap confidence intervals 
we can forecast using the data from the last 3 years, that is, the last 36 points 



(Fragment 5.9) 



Fragment 5.9: "MotorVehicle" trend: forecasting of last 3 years 
behaviour 

s <- new. ssa(trend[506 : 541] ) 

free <- rforecast(s, groups = list(l), 

len = 24, only. new = FALSE) $F1 
matplot (cbind(c (trend [506: 541] , rep(NA,24)), free), type='l') 



If we consider longer time period for forecasting (Fragment 5.10), then 



we will see totally different forecast (Fig. 16). 

Fragment 5.10: "MotorVehicle" trend: forecasting of last 22 years 
behaviour 

s <- new.ssa(trend[270:541]) 

free <- rforecast(s, groups = list (1:4), 

len = 24, only. new = FALSE) $F1 
matplot (cbind(c (trend [270: 541] , rep(NA,24)), free), type='l') 

To choose the proper forecast, additional macroeconomic analysis is nec- 
essary. 



heterogeneiety matrix or using other methods of change-point analysis. 
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Figure 15: "MotorVehicle" seasonality: dependence of forecast accuracy on number of 
removed points 

6. Future 

We consider the current version of the Rssa package as an efficient imple- 
mentation of most of SSA algorithms for one-dimensional time series, which 
contains also many visual tools for proper SSA parameter choice and exam- 
ination of the results. 

Let us enumerate the points that are expected to be improved and ex- 
tended in next versions of Rssa. First, fast implementation of the numeric 
SVD needs improving for better stability. Also, not all of known SSA op- 
tions are included to the current version. In particular, centering is not 
implemented. Also, SSA-ICA, Monte Carlo SSA, and many other versions 
can be helpful. It is not possible to implement all of them. Therefore, we 
plan to extend the package preferably according to our methodology of SSA. 

Remark that the current version of the Rssa contains multivariate exten- 
sions, however, they need further improving and development. 
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Figure 16: "MotorVehicle" trend: forecasting of last 3 and 22 years behaviour 
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